Collective Operations for Wide-Area Message Passing Systems Using Dynamically Created Spanning Trees

ثبت نشده
چکیده

We propose a configuration-free method to perform collective operations efficiently in dynamically changing topologies. Our collective operations are designed so that (1) they perform well when the topology is stable, (2) they complete successfully even when processors join or leave, and (3) they adapt to topology changes. We propose to create adaptive latency-aware spanning trees for short messages and adaptive bandwidth-aware spanning trees for long messages, and perform collective operations along those trees. We implemented the latency-aware spanning tree on the Phoenix Message Passing Library, and confirmed that broadcasts and reductions execute faster than a topology-unaware implementation although not quite as fast a static topology-aware implementation. In another experiment, we studied the behavior of our broadcast when a part of the processors left and rejoined during computation—our broadcast temporarily performed worse while the spanning trees were being reconstructed, but completed successfully, and resumed effective execution once the topology change was over.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collective operations for wide-area message passing systems using adaptive spanning trees

We propose a method for wide-area message-passing systems to perform broadcasts and reductions efficiently using latency and bandwidth-aware spanning trees constructed at run-time. These trees are updated when processes join or leave a computation, allowing effective execution to continue. We have implemented our proposal on the Phoenix Message-Passing Library and performed experiments using 16...

متن کامل

Design and Implementation of Adaptive Message Passing Systems for Wide-Area Distributed Computing Environments

Recently, much research has gone into wide-area message passing systems, but more work is necessary so that message passing systems can adapt to wide-area environments by themselves and stop requiring manual configuration. Thus, in this paper, I make two proposals concerning the design and implementation of adaptive message passing systems for wide-area, distributed environments. My first propo...

متن کامل

MPI’s Reduction Operations in Clustered Wide Area Systems

The emergence of meta computers and computational grids makes it feasible to run parallel programs on large-scale, geographically distributed computer systems. Writing parallel applications for such systems is a challenging task which may require changes to the communication structure of the applications. MPI’s collective operations (such as broadcast and reduce) allow for some of these changes...

متن کامل

Generalized Communicators in the Message Passing Interface

We propose extensions to the Message Passing Interface (MPI) that generalize the MPI communicator concept to allow multiple communication endpoints per process, dynamic creation of endpoints, and the transfer of endpoints between processes. The generalized communicator construct can be used to express a wide range of interesting communication structures, including collective communication opera...

متن کامل

Distributed Algorithms for Constructing Balanced Spanning Trees on System-ranked Process Groups

Parallel programs often express operations on a subset (process group) of all the participating processes or ranks. Subcommunicators in MPI are an example of such process groups. Often, these process groups are used only for simple collective communication (broadcast, reduction, allreduce) over the members of the process group. Current algorithms to create process groups tend to be centralized ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005